Aim 1 Analysis: Characterizing The Dynamics of our Three Populations Using Transcriptomic Analysis¶

In this notebook, we will be completing our analysis of Days 7 - Day 13 of the CD34+ Cells in LEM.

The Research Questions of this Aim are:

  1. Are There Unique Regions in Each of the 3 Populations of the GeneSpace?.

  2. What is the genetic makeup of regions unique to a given population. Does this have functional implications?

  3. Is There Statistically Significant Evidence of Mast Cell Commitment Among These Populations?

Pre-Processing Workflow¶

Here, we begin by loading in necessary libraries, cleaning our data up to remove outliers from the UMAP space, and visualizing the general UMAP results.

Let's begin with loading in the necessary libraries. The comments indicate what each library allows us to do in this workflow.

In [1]:
# Loading Libraries

library(BiocSingular) # We need this to use the BioConductor libraries that work on the Single Cell data. #nolint
library(SingleCellExperiment) # We need this to use the SingleCellExperiment data structure.  # nolint
library(ggplot2) # we need this to make ggplot visualizations #nolint
library(tidyr) # we need this to manipulate data #nolint
library(dplyr) # we need this to manipulate data #nolint
library(patchwork) # to display plots side by side. #nolint
library(ggforce) # Allows me to display circles on ggplots. #nolint
library(limma) # helps with differential expression analysis #nolint
library(IRdisplay) # lets me display JPEGs in the notebook #nolint
library(org.Hs.eg.db) # lets me do gene annotation #nolint
library(clusterProfiler) # lets me do gene set enrichment analysis #nolint

# library(scuttle)
# library(scran)
# library(scater)
# library(scDblFinder)
# library(DropletUtils)
# library(DropletTestFiles)
# library(uwot)
# library(rtracklayer)
# library(PCAtools)
# library(celldex)
# library(SingleR)
# library(batchelor)
# library(bluster)
Loading required package: SummarizedExperiment

Loading required package: MatrixGenerics

Loading required package: matrixStats

Loading required package: MatrixGenerics

Loading required package: matrixStats


Attaching package: 'MatrixGenerics'


The following objects are masked from 'package:matrixStats':

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
    colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
    colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
    colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
    colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
    colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds,
    colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
    rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
    rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
    rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
    rowWeightedSds, rowWeightedVars


Loading required package: GenomicRanges

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: 'BiocGenerics'


The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs


The following objects are masked from 'package:base':

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, saveRDS, setdiff,
    table, tapply, union, unique, unsplit, which.max, which.min


Loading required package: S4Vectors


Attaching package: 'S4Vectors'


The following object is masked from 'package:utils':

    findMatches


The following objects are masked from 'package:base':

    expand.grid, I, unname


Loading required package: IRanges


Attaching package: 'IRanges'


The following object is masked from 'package:grDevices':

    windows


Loading required package: GenomeInfoDb

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.



Attaching package: 'Biobase'


The following object is masked from 'package:MatrixGenerics':

    rowMedians


The following objects are masked from 'package:matrixStats':

    anyMissing, rowMedians



Attaching package: 'tidyr'


The following object is masked from 'package:S4Vectors':

    expand



Attaching package: 'dplyr'


The following object is masked from 'package:Biobase':

    combine


The following objects are masked from 'package:GenomicRanges':

    intersect, setdiff, union


The following object is masked from 'package:GenomeInfoDb':

    intersect


The following objects are masked from 'package:IRanges':

    collapse, desc, intersect, setdiff, slice, union


The following objects are masked from 'package:S4Vectors':

    first, intersect, rename, setdiff, setequal, union


The following objects are masked from 'package:BiocGenerics':

    combine, intersect, setdiff, union


The following object is masked from 'package:matrixStats':

    count


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



Attaching package: 'limma'


The following object is masked from 'package:BiocGenerics':

    plotMA


Loading required package: AnnotationDbi


Attaching package: 'AnnotationDbi'


The following object is masked from 'package:dplyr':

    select






clusterProfiler v4.14.4 Learn more at https://yulab-smu.top/contribution-knowledge-mining/

Please cite:

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan,
X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
enrichment tool for interpreting omics data. The Innovation. 2021,
2(3):100141


Attaching package: 'clusterProfiler'


The following object is masked from 'package:AnnotationDbi':

    select


The following object is masked from 'package:IRanges':

    slice


The following object is masked from 'package:S4Vectors':

    rename


The following object is masked from 'package:stats':

    filter


Based on the process outlined in d0_Analysis.ipynb, I've created SingleCell Objects containing eexperimental data from the d7-d13 timepoints.

In [2]:
load("data/phenotype_with_ID.RData")
load("data/merge2.RData")

We have now loaded pre-processed merge2 data, and the associated phenotypes. Next, I will create an index linking cells to their phenotype. This will allow me to connect them to their flow_cytometry populations.

In [3]:
pheno.d7 <- rep("CD34+CD45RA-CLEC12A-", 3039)
names(pheno.d7) <- colnames(merge2)[1:3039]

pheno.merge2 <- c(pheno.d7, pheno.d10, pheno.d13)

The next step would be to complete PCA on our top genes. This has already be done, we simply need to access it using the reducedDimNames command.

In [4]:
# PCA has already been done on the top genes
reducedDimNames(merge2)
  1. 'PCA.cc'
  2. 'UMAP.cc'
  3. 'PCA.5k'
  4. 'PCA.nocc'
  5. 'UMAP.nocc'
  6. 'TSNE.nocc'
  7. 'TSNE.5k'
  8. 'PCA'
  9. 'TSNE'

Ok nice - next we add phenotype metadata

In [5]:
# Add phenotypes as a column in colData
colData(merge2)$Phenotype <- pheno.merge2

As we can see, there's too many phenotypes present here. Lets break the data into our 3 populations of interest, and ignore everything else as an Other category, to better understand our question.

In [6]:
# Establishing Population Groups

# Define phenotype groups
phenotype_groups <- list(
  Raneg_Cneg = c("CD34+CD45RA-CLEC12A-", "CD34-CD45RA-CLEC12A-"), # Ra-C-
  Rapos_Cneg = c("CD34+CD45RA+CLEC12A-", "CD34-CD45RA+CLEC12A-"), # Ra+C-
  Cpos = c("CD34-CD45RA-CLEC12A+", "CD34+CD45RA-CLEC12A+", "CD34+CD45RA+CLEC12A+", "CD34-CD45RA+CLEC12A+"), # C+ # nolint
  Other = c("CD10+", "CD14CD15+") # Pro -B #Pro-NM #FW Gating from a flow cytometer #nolint
)

# Assign group labels to phenotypes
group_labels <- sapply(pheno.merge2, function(phenotype) {
  group <- names(phenotype_groups)[sapply(phenotype_groups, function(g) phenotype %in% g)] # nolint
  if (length(group) > 0) group else "Other"
})

# Add group labels to colData of the SCE object
colData(merge2)$Group <- group_labels

Nice! We can now begin visualizing the UMAP using the ggplot package.

In [7]:
options(repr.plot.width = 12, repr.plot.height = 8)

umap_df <- reducedDim(merge2, "UMAP.cc") %>%
  as.data.frame() %>%
  mutate(Group = merge2$Group)


gg_umap <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2, color = Group)) +
  geom_point(alpha = 0.8, size = 1) +
  labs(
    title = "UMAP of Day 7 - Day 13 Gene Expression by Group",
    x = "UMAP1",
    y = "UMAP2",
    color = "Group"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "bottom",
    legend.text = element_text(size = 12), # Increase legend text size
    legend.key.size = unit(1.5, "cm") # Increase legend color box size
  )

# Print the UMAP plot
print(gg_umap)
No description has been provided for this image

It appears that we have to outliers, with a value exceedin UMAP1 = 4 on the x-axis. Lets get rid of these, and then continue with our analysis.

In [8]:
# Identify non-outlier cells
valid_cells <- which(reducedDim(merge2, "UMAP.cc")[, 1] < 4) # Filtering UMAP1 < 4

# Subset SingleCellExperiment object to keep only valid cells
merge2_clean <- merge2[, valid_cells]

# Confirming we get the sampe plot
umap_df <- reducedDim(merge2_clean, "UMAP.cc") %>%
  as.data.frame() %>%
  mutate(Group = merge2_clean$Group)


gg_umap <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2, color = Group)) +
  geom_point(alpha = 0.8, size = 1) +
  labs(
    title = "UMAP of Day 7 - Day 13 Gene Expression by Group",
    x = "UMAP1",
    y = "UMAP2",
    color = "Group"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "bottom",
    legend.text = element_text(size = 12), # Increase legend text size
    legend.key.size = unit(1.5, "cm") # Increase legend color box size
  )
gg_umap
No description has been provided for this image

Next, lets add a column delineating which day each cell belongs to, so that we may analyze the differences in the three populations between each time point as well.

In [9]:
# Extract the day information from cell names
colData(merge2_clean)$Day <- gsub(".*Day_([0-9]+).*", "\\1", rownames(colData(merge2_clean)))
# Convert to a factor (optional, for better categorical handling)
colData(merge2_clean)$Day <- factor(colData(merge2_clean)$Day, levels = sort(unique(colData(merge2_clean)$Day)))

colnames(colData(merge2_clean))
  1. 'Sample'
  2. 'Barcode'
  3. 'sum'
  4. 'detected'
  5. 'subsets_Mito_sum'
  6. 'subsets_Mito_detected'
  7. 'subsets_Mito_percent'
  8. 'altexps_Antibody Capture_sum'
  9. 'altexps_Antibody Capture_detected'
  10. 'altexps_Antibody Capture_percent'
  11. 'total'
  12. 'sizeFactor'
  13. 'label'
  14. 'scDblFinder.cluster'
  15. 'scDblFinder.class'
  16. 'scDblFinder.score'
  17. 'scDblFinder.weighted'
  18. 'scDblFinder.difficulty'
  19. 'scDblFinder.cxds_score'
  20. 'scDblFinder.mostLikelyOrigin'
  21. 'scDblFinder.originAmbiguous'
  22. 'batch'
  23. 'Phenotype'
  24. 'Group'
  25. 'Day'

Analysis Workflow¶

RQ 1: Are There Unique Regions in Each of the 3 Populations of the GeneSpace?¶

In this section, we will be looking at how the three populations are similiar, and how they are different, at a genespace level. The goal is to especially establish the changes in cell fate differences in the 3 populations.

Lets begin by visualizing the 3 populations in the genespace, and look for qualitativie difference and similiarities in their population spread.

In [10]:
options(repr.plot.width = 12, repr.plot.height = 8)

gg_AllDay_AllPop <- gg_umap

gg_Cpos <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2)) +
  geom_point(aes(color = ifelse(Group == "Cpos", "C+", "Other")),
    alpha = ifelse(umap_df$Group == "Cpos", 1, 0.2), size = 1
  ) +
  scale_color_manual(
    values = c("C+" = "purple", "Other" = "gray"),
    name = "Group"
  ) +
  labs(
    title = "C+ Population Overlaid on Primary UMAP",
    x = "UMAP1",
    y = "UMAP2"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "right",
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 14, face = "bold")
  )

gg_RanegCneg <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2)) +
  geom_point(aes(color = ifelse(Group == "Raneg_Cneg", "Ra-C-", "Other")),
    alpha = ifelse(umap_df$Group == "Raneg_Cneg", 1, 0.2), size = 1
  ) +
  scale_color_manual(
    values = c("Ra-C-" = "black", "Other" = "gray"),
    name = "Group"
  ) +
  labs(
    title = "Ra-C- Population Overlaid on Primary UMAP",
    x = "UMAP1",
    y = "UMAP2"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "right",
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 14, face = "bold")
  ) #+
# scale_x_continuous(breaks = seq(-7, 1, by = 1)) + # More x-axis ticks
# scale_y_continuous(breaks = seq(-7, 6, by = 1)) # More y-axis ticks


gg_RaposCneg <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2)) +
  geom_point(aes(color = ifelse(Group == "Rapos_Cneg", "Ra+C-", "Other")),
    alpha = ifelse(umap_df$Group == "Rapos_Cneg", 1, 0.2), size = 1
  ) +
  scale_color_manual(
    values = c("Ra+C-" = "blue", "Other" = "gray"),
    name = "Group"
  ) +
  labs(
    title = "Ra+C- Population Overlaid on Primary UMAP",
    x = "UMAP1",
    y = "UMAP2"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "right",
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 14, face = "bold")
  )


# Print the UMAP plot
print(gg_Cpos)
print(gg_RanegCneg)
print(gg_RaposCneg)
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Nice! Lets now create a faceted visualization, showing the distribution of the 3 populations overlaid in the same UMAP.

In [11]:
# Makes the plot nice and wide.
options(repr.plot.width = 28, repr.plot.height = 14)


# Combine two plots side by side
gg_combined <- gg_Cpos + gg_RanegCneg + gg_RaposCneg + plot_layout(ncol = 3)

# Print the combined plot
print(gg_combined)
No description has been provided for this image

Very interesting, what are some take-aways we can see from this distribution. Perhaps we can manually subset groups and complete DGE on them, to look for more pronounced differences between groups:

  1. All groups have some homogeneity along the diagonal, which is their primary region of overlap.

  2. At the bottom left, the island is primarily C+ cells. That area is dominated by the C+ cells, and nothing else. Hence, it appears to be a near homogenous C+ population that exists there, within this dataset. Similiarly, the peak at the top of the plot is also dominated by C+ cells, and could be another subset to look at when completing DGE.

  3. The larger island, on a diagonal from -4 to -7, seems to be primarily occuped by the Ra-C- population. Additionally, the bottom right peak also seems to be enriched for this population.

  4. The small long island at the bottom is very enriched for Ra+C- cells, but the lower half has some overlap with C+ population. Additionally, there is a small enrichment in the top left, which is unique to the population, but it is limited.

Heres's the regions of interest highlighted with circles:

In [12]:
display_jpeg(file = "img/3Unique.jpg")
No description has been provided for this image

RQ2 : What is the Genetic Makeup of Regions Unique to Each Population. Does it Have Functional Implications?¶

Now that we have identified these unique regions, lets take a look at what genes are over-expressed in this locations, compared to the rest of the UMAP. This will give some hints into what is functionally unique about these cells within each population - perhaps pointing to the existance of a subcluster/subpopulation with a linked fate.

To begin, we must modify our Merg2 sce object to only contain cells that belong to these populations. Lets do that below:

In [13]:
options(repr.plot.width = 12, repr.plot.height = 8)

## Segmenting Unique CPos Cells
CPos_R1_pooled <- rownames(subset(umap_df, UMAP1 > -3.75 & UMAP1 < -2.25 &
  UMAP2 > 2 & UMAP2 < 6)) # nolint Square region R1
CPos_R2_pooled <- rownames(subset(umap_df, UMAP1 > -2.5 & UMAP1 < -1.5 &
  UMAP2 > 3 & UMAP2 < 5)) # nolint Square region R2
CPos_R3_pooled <- rownames(subset(umap_df, UMAP1 > -4.5 & UMAP1 < -3.25 &
  UMAP2 > -5.5 & UMAP2 < -2.5)) # nolint Square region R3
CPos_R4_pooled <- rownames(subset(umap_df, UMAP1 < -1.75 & UMAP1 > -3 &
  -4.5 < UMAP2 & UMAP2 < -2.5)) # nolint Square region R4
CPos_R5_pooled <- rownames(subset(umap_df, UMAP1 > -3 & UMAP1 < -1.5 &
  -5.75 < UMAP2 & UMAP2 < -4.5)) # nolint Square region R5

# Combine all unique C+ Cells into one vector
Cpos_unique_pooled <- unique(c(
  CPos_R1_pooled, CPos_R2_pooled,
  CPos_R3_pooled, CPos_R4_pooled, CPos_R5_pooled
))

# Subset the SCE object to retain only cells in any of the selected squares
merge2_CPosUnique <- merge2_clean[, Cpos_unique_pooled]

# Visualizing the unique C+ cells
Cpos_subset_plot <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2)) +
  geom_point(alpha = 0.1) + # Lightly plot all cells
  geom_point(data = umap_df[Cpos_unique_pooled, ], color = "purple", alpha = 0.6) + # Highlight selected cells
  theme_minimal()



## Segmenting Unique Ra-C- Cells
RanegCneg_R1_pooled <- rownames(subset(umap_df, UMAP1 > -7 & UMAP1 < -3.5 &
  UMAP2 > -2.65 & UMAP2 < (-1.5 * UMAP1 + (-6.0)))) # nolint Square region R1
RanegCneg_R2_pooled <- rownames(subset(
  umap_df,
  UMAP2 < 0 & UMAP2 > (-2.8 * UMAP1 + (-5.0))
)) # nolint Square region R2
RanegCneg_R3_pooled <- rownames(subset(umap_df, UMAP1 > -1.25 & UMAP1 < 0.5 &
  UMAP2 > 0 & UMAP2 < 3)) # nolint Square region R3

# Combine all unique Ra-C- Cells into one vector
RanegCneg_unique_pooled <- unique(c(
  RanegCneg_R1_pooled, RanegCneg_R2_pooled,
  RanegCneg_R3_pooled
))

# Subset the SCE object to retain only cells in any of the selected squares
merge2_RanegCnegUnique <- merge2_clean[, RanegCneg_unique_pooled]

# Visualizing the unique Ra-C- cells
Raneg_Cneg_subset_plot <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2)) +
  geom_point(alpha = 0.1) + # Lightly plot all cells
  geom_point(data = umap_df[RanegCneg_unique_pooled, ], color = "black", alpha = 0.6) + # Highlight selected cells
  theme_minimal()



## Segmenting Unique Ra+C- Cells
RaposCneg_R1_pooled <- rownames(subset(umap_df, UMAP1 > -5.5 & UMAP1 < -4.25 &
  UMAP2 > 3.5 & UMAP2 < 5.5)) # nolint Square region R1
RaposCneg_R2_pooled <- rownames(subset(umap_df, UMAP1 > -2 & UMAP1 < -1.25 &
  UMAP2 > -5.5 & UMAP2 < -4)) # nolint Square region R1

# Combine all unique Ra+C- Cells into one vector
RaposCneg_unique_pooled <- unique(c(
  RaposCneg_R1_pooled, RaposCneg_R2_pooled
))

# Subset the SCE object to retain only cells in any of the selected squares
merge2_RaposCnegUnique <- merge2_clean[, RaposCneg_unique_pooled]

# Visualizing the unique Ra+C- cells
Rapos_Cneg_subset_plot <- ggplot(umap_df, aes(x = UMAP1, y = UMAP2)) +
  geom_point(alpha = 0.1) + # Lightly plot all cells
  geom_point(data = umap_df[RaposCneg_unique_pooled, ], color = "blue", alpha = 0.6) + # Highlight selected cells
  theme_minimal()


# Plotting 3 Subsets side by side
options(repr.plot.width = 28, repr.plot.height = 14)

# Combine two plots side by side
gg_uniquePops_combined <- Cpos_subset_plot + Raneg_Cneg_subset_plot + Rapos_Cneg_subset_plot + plot_layout(ncol = 3)

# Print the combined plot
print(gg_uniquePops_combined)
No description has been provided for this image
In [14]:
# Initialize the Region column with a default value of "None"
colData(merge2_clean)$Region <- "None"

# Combine all Regions cell names into a single vector
all_R1_cells <- unique(c(RaposCneg_R1_pooled, RanegCneg_R1_pooled, CPos_R1_pooled))
all_R2_cells <- unique(c(RaposCneg_R2_pooled, RanegCneg_R2_pooled, CPos_R2_pooled))
all_R3_cells <- unique(c(RanegCneg_R3_pooled, CPos_R3_pooled))
all_R4_cells <- unique(CPos_R4_pooled)
all_R5_cells <- unique(CPos_R5_pooled)


# Ensure that we're only using valid cell names
valid_R1_cells <- intersect(all_R1_cells, colnames(merge2_clean))
valid_R2_cells <- intersect(all_R2_cells, colnames(merge2_clean))
valid_R3_cells <- intersect(all_R3_cells, colnames(merge2_clean))
valid_R4_cells <- intersect(all_R4_cells, colnames(merge2_clean))
valid_R5_cells <- intersect(all_R5_cells, colnames(merge2_clean))


# Assign "R1" to all valid R1 cells in the Region column
colData(merge2_clean)$Region[colnames(merge2_clean) %in% valid_R1_cells] <- "R1"
colData(merge2_clean)$Region[colnames(merge2_clean) %in% valid_R2_cells] <- "R2"
colData(merge2_clean)$Region[colnames(merge2_clean) %in% valid_R3_cells] <- "R3"
colData(merge2_clean)$Region[colnames(merge2_clean) %in% valid_R4_cells] <- "R4"
colData(merge2_clean)$Region[colnames(merge2_clean) %in% valid_R5_cells] <- "R5"

# Verify the result
table(colData(merge2_clean)$Region)
None   R1   R2   R3   R4   R5 
5125 2882 1961 1316  511  384 

Additionally, we will repeatedly be comparing different regions and different populations against each other. Instead of writing the same script over and over again, I have defined the function perform_dge to do this work for me in one location, saving space in this notebook.

Here is some documentation regarding this function:

Arugments: The function has the capacity to take up to 5 arguments at maximum, and 3 arugments at minimum:

  • sce_object: An sce object (in this analysis, merge2_clean) containing the experimental CITE-Seq data.
  • target_pop: One of the populations you want to compare.
  • target_region: The region in the target population that you want to compare
  • comp_pop: Optional - the other population you want to compare to.
  • comp_region: Optional - theo ther population's region you want to compare to.

Note: Only the first 3 arguments are necessary. If only these are passed, the function will compare the target region to the remaining cells within the target population.

Finally, the function will return a list containing the top genes that are differentially expressed, and top CD marker associated genes that are differentially expresseed.

As a reminder, here is a table containing all the populations and their regions of interest, as defined above:

Population | Regions | Description| ---------------|-------------|------------------| "Raneg_Cneg"| "R1" , "R2" , "R3"| CD45Ra-Clec12A- population "Rapos_Cneg"| "R1" , "R2" | CD45Ra+Clec12A- population "Cpos" | "R1", "R2", "R3", "R4", "R5" | Clec12A+ population

In [15]:
perform_dge <- function(sce_object, target_pop, target_region, comp_pop = NULL, comp_region = NULL) {
  # Identify target cells
  target_cells <- colnames(sce_object)[colData(sce_object)$Group == target_pop]
  sce_target <- sce_object[, target_cells]

  # Identify cells within the target region
  target_region_cells <- colnames(sce_target)[colData(sce_target)$Region == target_region] # nolint

  # Determine comparison group
  if (!is.null(comp_pop) && !is.null(comp_region)) {
    comp_cells <- colnames(sce_object)[colData(sce_object)$Group == comp_pop]
    sce_comp <- sce_object[, comp_cells]
    comp_region_cells <- colnames(sce_comp)[colData(sce_comp)$Region == comp_region] # nolint
    comparison_cells <- c(target_region_cells, comp_region_cells)
    colData(sce_target)$ComparisonGroup <- ifelse(colnames(sce_target) %in% target_region_cells, # nolint
      target_region, "Other"
    )
  } else {
    colData(sce_target)$ComparisonGroup <- ifelse(colnames(sce_target) %in% target_region_cells, # nolint
      target_region, "Other"
    )
  }

  # Extract logcounts assay
  expr_matrix <- assay(sce_target, "logcounts")

  # Create a design matrix
  design <- model.matrix(~ 0 + colData(sce_target)$ComparisonGroup)
  colnames(design) <- levels(factor(colData(sce_target)$ComparisonGroup))

  # Fit the linear model
  fit <- lmFit(expr_matrix, design)

  # Define the contrast
  contrast <- makeContrasts(contrasts = paste0(target_region, "-Other"), levels = design) # nolint
  fit2 <- contrasts.fit(fit, contrast)
  fit2 <- eBayes(fit2)


  # Extract top genes
  top_genes <- topTable(fit2, coef = 1, number = Inf, ) %>% # since I am always only contrasting two things #nolint
    filter(P.Value < 0.05) %>% # Filter by p-value
    arrange(desc(abs(logFC))) # Sort by Log-Fold Change

  # Extract top CD genes
  cd_genes <- top_genes[grep("CD", top_genes$ID), ]
  excluded_patterns <- "CDK|CDC|PCD|LCD|CDR|NCD|CCD|TCD|OCD|3CD|DCD|KCD"
  cd_genes <- cd_genes[!grepl(excluded_patterns, cd_genes$ID), ]

  # Return the results as a list
  return(list(top_genes = as.data.frame(top_genes), top_cd_genes = as.data.frame(cd_genes)))
}
Characterizing Region 1 (R1) of the Ra+C- Population¶

In this section we will be answering the question What makes the Ra+C- Region 1 (R1) cells different from the remaining Ra+C- cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes, and a automatic Gene Ontology (GO) analaysis. Lets begin with the workflow below:

In [16]:
raPosCneg_R1_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Raneg_Cneg", target_region = "R1")

# Print the top Genes differentially expressed in this region
head(raPosCneg_R1_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(raPosCneg_R1_comparison$top_cd_genes, 20)
Warning message in asMethod(object):
"sparse->dense coercion: allocating vector of size 1.5 GiB"
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1HBD 2.6005580.9332628 60.29083 0.000000e+00 0.000000e+001381.5478
2STXBP5 2.3875271.9199339 69.60401 0.000000e+00 0.000000e+001722.4838
3ITGA2B 2.2814500.7618509 92.47893 0.000000e+00 0.000000e+002561.5289
4LTBP1 2.1652470.6594841 65.40463 0.000000e+00 0.000000e+001567.9860
5GP1BB 2.0997300.6598842 72.72827 0.000000e+00 0.000000e+001837.8255
6RAP1B 1.9792282.1936284 70.07873 0.000000e+00 0.000000e+001739.9956
7ABCC4 1.9252871.0928290 72.21383 0.000000e+00 0.000000e+001818.8220
8SPINK2 -1.9248721.7426931-69.27794 0.000000e+00 0.000000e+001710.4599
9PLCB1 -1.8755701.8908794-59.28000 0.000000e+00 0.000000e+001345.0226
10RNF220 -1.8288652.5707110-57.12300 0.000000e+00 0.000000e+001267.5479
11SLC24A3 1.7938770.6780904 57.21880 0.000000e+00 0.000000e+001270.9741
12MED12L 1.7474570.8803669 63.72476 0.000000e+00 0.000000e+001506.4814
13C1QTNF4 -1.7422531.5501189-60.94412 0.000000e+00 0.000000e+001405.2200
14PLXDC2 1.7058430.9520478 58.30154 0.000000e+00 0.000000e+001309.7954
15SH3BGRL3 1.6998352.6590816 58.64659 0.000000e+00 0.000000e+001322.2032
16ATP8B4 -1.6791061.6559922-52.94559 0.000000e+00 0.000000e+001119.7040
17NKAIN2 -1.6648882.7157017-37.301911.739818e-2713.557491e-269 608.1647
18RAB27B 1.6291710.9554767 58.10343 0.000000e+00 0.000000e+001302.6790
19UBE2C 1.5953140.9767416 53.48448 0.000000e+00 0.000000e+001138.5884
20LAT 1.5668110.4916719 72.43056 0.000000e+00 0.000000e+001826.8277
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
33CD74 -1.38862352.2728535-44.70930 0.000000e+00 0.000000e+00 840.1121
95CD99 -1.01494871.4580142-43.25054 0.000000e+00 0.000000e+00 792.7357
99CD84 0.99845810.4955969 53.73067 0.000000e+00 0.000000e+001147.2357
107CD48 -0.96846660.8941089-44.67238 0.000000e+00 0.000000e+00 838.9039
126CD63 0.91357882.2570242 37.466601.289011e-2732.759011e-271 613.0672
142CD36 0.87446680.3246596 36.777249.774527e-2651.873076e-262 592.6316
143CD34 -0.86773031.3150180-35.248662.094257e-2453.376736e-243 548.1467
177CD44 -0.81268851.2309836-32.721681.334583e-2141.732166e-212 477.2655
195CD55 0.77166360.5686104 39.335493.607161e-2989.105220e-296 669.5748
275CD52 -0.66019831.5512427-23.337586.625781e-1153.587429e-113 247.9604
318CD53 -0.62245230.8918330-28.844012.102008e-1701.904346e-168 375.5861
529CD164-0.49211401.8707070-20.19441 1.613185e-87 6.567762e-86 185.0219
552CD200-0.47938800.4247843-28.941351.828026e-1711.681095e-169 378.0259
659BICD1-0.43510511.0190811-19.46995 1.103106e-81 4.153786e-80 171.6179
839CD69 0.38025780.6807524 16.06343 9.060894e-57 2.367150e-55 114.4202
847CD37 -0.37842511.4375666-16.07689 7.365320e-57 1.926934e-55 114.6266
970CD9 0.34536460.1117994 24.971472.049375e-1301.293262e-128 283.6182
992CD226 0.33908670.1304901 27.698404.042864e-1583.302966e-156 347.3312
1017CD82 0.33354240.5675827 17.01665 2.590232e-63 7.512289e-62 129.4362
1041CD302-0.32629790.5838855-18.37965 2.984599e-73 1.005887e-71 152.2521
Characterizing Region 2 (R2) of the Ra+C- Population¶

In this section we will be answering the question What makes the Ra+C- Region 2 (R2) cells different from the remaining Ra+C- cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes, and a automatic Gene Ontology (GO) analaysis. Note that this is a repetition of the previous workflow:

In [17]:
raPosCneg_R2_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Raneg_Cneg", target_region = "R2")

# Print the top Genes differentially expressed in this region
head(raPosCneg_R2_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(raPosCneg_R2_comparison$top_cd_genes, 20)
Warning message in asMethod(object):
"sparse->dense coercion: allocating vector of size 1.5 GiB"
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1NKAIN2 1.61735662.7157017 35.884552.191341e-2532.673509e-249566.5133
2HIST1H4C-1.31128812.2538890-28.056656.383785e-1623.894215e-159356.0775
3HMGB2 -1.28639522.6003865-38.385091.345418e-2852.462181e-281640.6437
4TOP2A -1.17485501.2552635-34.553738.701558e-2373.981071e-233528.3167
5UBE2C -1.15634850.9767416-34.601962.213043e-2371.157137e-233529.6850
6TUBB4B -1.12931531.5703254-35.093991.772643e-2431.297610e-239543.7134
7HBD -1.10491580.9332628-20.55624 1.679011e-90 2.301628e-88191.8769
8HLA-DRA 1.04082712.3423770 26.371182.330121e-1449.072847e-142315.6858
9MSI2 1.03008162.5742865 30.686566.450141e-1911.026442e-187422.7771
10RNF220 1.02779362.5707110 26.978891.350358e-1506.101786e-148330.0294
11UBE2S -1.02163061.3800974-33.738128.204519e-2272.144954e-223505.3652
12SPINK2 1.02122301.7426931 28.701887.356749e-1695.609675e-166372.0369
13HIST1H1B-1.01882951.0765159-29.627855.103361e-1795.336803e-176395.4053
14HOPX 1.00469591.1811817 35.590541.100540e-2491.007022e-245557.9968
15MKI67 -0.99518431.0553881-34.983544.186815e-2422.554027e-238540.5534
16CENPF -0.97125441.3730378-29.458353.845265e-1773.703698e-174391.0874
17CD74 0.95274692.2728535 28.006362.192806e-1611.294498e-158354.8448
18PLCB1 0.94931431.8908794 24.614175.956410e-1271.639177e-124275.6575
19RRM2 -0.94713060.8967019-38.448191.993591e-2867.296741e-282642.5521
20NRIP1 0.92644142.4574898 29.477712.349684e-1772.324345e-174391.5795
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
17CD74 0.95274692.2728535 28.0063552.192806e-1611.294498e-158354.84484
45CD52 0.77538461.5512427 27.8750075.459927e-1603.122481e-157351.63352
64CD99 0.67378351.4580142 26.2386195.168612e-1431.950272e-140312.59046
100CD48 0.59691660.8941089 24.8004679.422292e-1292.715475e-126279.79801
136CD34 0.54900611.3150180 20.871016 3.923522e-93 5.544588e-91197.92294
138CD37 0.54430911.4375666 23.6766094.705785e-1181.118419e-115255.20109
195CD44 0.46870101.2309836 17.705505 2.986698e-68 2.739753e-66140.77385
315CD84 -0.40148530.4955969-17.942543 5.442702e-70 5.201262e-68144.76716
316CD53 0.40057130.8918330 17.744817 1.542205e-68 1.432646e-66141.43284
389CD63 -0.37678932.2570242-13.996693 9.252102e-44 4.594792e-42 84.59377
394CD36 -0.37540740.3246596-14.368022 5.617585e-46 2.988506e-44 89.67381
520CD200 0.33267010.4247843 19.285397 3.159699e-80 3.648207e-78168.27364
523CD109 0.33220490.8986139 14.935910 1.805819e-49 1.059211e-47 97.68097
570CD55 -0.32040020.5686104-14.666228 8.527844e-48 4.824229e-46 93.84272
727CD164 0.28293551.8707070 11.305824 2.595364e-29 7.704212e-28 51.52489
842SCD -0.26330340.6314002-13.023544 3.319280e-38 1.372757e-36 71.87000
1027CD79B 0.23418510.3376392 15.483915 5.876647e-53 3.780161e-51105.67854
1441C2CD2 0.18826070.4018778 11.341493 1.743786e-29 5.240091e-28 51.91960
1479HACD3-0.18570941.1536948 -8.458455 3.440544e-17 5.721298e-16 23.88617
1711CD38 -0.16941500.7399566 -7.333030 2.578988e-13 3.285539e-12 15.09935

Characterizing Region 1 (R1) of the C+ Population¶

In this section we will be answering the question What makes the C+ Region 1 (R1) cells different from the remaining C+ cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [18]:
Cpos_R1_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Cpos", target_region = "R1")

# Print the top Genes differentially expressed in this region
head(Cpos_R1_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Cpos_R1_comparison$top_cd_genes, 20)
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1TOP2A 1.9432761.376907645.74262 0.000000e+00 0.000000e+00814.3644
2HIST1H4C1.9253172.357320030.623032.306751e-1831.918850e-180405.4065
3MKI67 1.8170811.298918749.55594 0.000000e+00 0.000000e+00926.0785
4UBE2C 1.6985710.902093646.68556 0.000000e+00 0.000000e+00841.7910
5HMGB2 1.6610682.776360633.391652.178373e-2132.847523e-210474.4930
6CENPF 1.4624201.374913233.775451.161869e-2171.575020e-214484.3259
7ASPM 1.4259541.006807440.028301.822738e-2908.339254e-287651.8870
8NUSAP1 1.4082441.080108944.33975 0.000000e+00 0.000000e+00773.8441
9UBE2S 1.3117361.381510834.034131.481856e-2202.086054e-217490.9864
10HIST1H1B1.2994071.009504431.222229.759089e-1909.399801e-187420.0710
11CENPE 1.2738670.884572836.496461.193035e-2483.358945e-245555.6415
12HMMR 1.2454650.762186640.384279.113318e-2954.765093e-291661.7871
13TUBB4B 1.2374071.538310129.030848.702895e-1675.491977e-164367.2696
14CDK1 1.2344740.771279343.13791 0.000000e+00 0.000000e+00739.4298
15KPNA2 1.1782211.067246934.182553.199156e-2224.878847e-219494.8198
16TUBB 1.1469562.830553227.339348.741665e-1504.637010e-147328.1595
17TPX2 1.1403680.951384535.007401.517589e-2312.923436e-228516.2768
18KIF11 1.1107840.789474841.020541.722484e-3021.050744e-298679.5653
19SMC4 1.1080211.630256531.627334.365967e-1944.699963e-191430.0783
20SAMHD1 1.0673952.019206616.11328 1.804498e-56 3.057705e-54113.7024
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
287CD34 -0.39038390.53207044-12.9832881.042245e-371.053790e-3570.6924335
319CD74 0.36698913.26015404 6.7748731.447153e-113.894651e-1011.0997499
495CD52 -0.29864000.73682286 -8.4575273.906943e-171.623133e-1523.7144037
938CDYL 0.20702441.07175479 6.3748782.061648e-104.944847e-09 8.5001428
961CD36 0.20442580.56109847 5.4543325.244542e-089.282180e-07 3.1063561
1057CD1D 0.19334820.35725796 7.2299025.866953e-131.799969e-1114.2443287
1135CD2AP 0.18319941.01727529 5.4336255.885873e-081.035218e-06 2.9945094
1168CD37 -0.17934841.44754954 -5.1196183.220804e-075.101603e-06 1.3498041
1189CD84 0.17692420.39274125 7.8466125.581901e-151.997098e-1318.8226322
1257CD1C 0.16959800.47744823 4.0791634.617056e-055.027934e-04-3.4088474
1258CD53 0.16953721.29404331 4.8770851.122918e-061.628365e-05 0.1455097
1285SCD 0.16791910.65393698 6.1777277.222710e-101.627823e-08 7.2757348
1502CD96 -0.14988610.31276717 -6.0389671.707456e-093.680482e-08 6.4365504
1562CD226 0.14552310.12519030 9.2655903.237663e-201.630009e-1830.7248689
1670CD86 0.13905960.41886951 4.9337938.427443e-071.248797e-05 0.4219298
1690CD180 0.13793170.34619455 6.1771657.248197e-101.632562e-08 7.2722971
1880HACD4 0.12784110.64434172 4.9950506.159714e-079.331610e-06 0.7240626
1970CD9 0.12408920.05155304 9.5783951.766450e-219.535964e-2033.6024257
2001CDV3 -0.12221311.35935331 -4.0809514.581838e-054.997016e-04-3.4015833
2132CD79B-0.11685370.21083310 -7.2589164.750914e-131.471135e-1114.4515760

Characterizing Region 2 (R2) of the C+ Population¶

In this section we will be answering the question What makes the C+ Region 2 (R2) cells different from the remaining C+ cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [19]:
Cpos_R2_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Cpos", target_region = "R2")

# Print the top Genes differentially expressed in this region
head(Cpos_R2_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Cpos_R2_comparison$top_cd_genes, 20)
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1HIST1H4C -1.27493752.3573200-11.9060504.331929e-328.750025e-2858.269990
2AFF3 -0.88913362.3235517 -9.0481772.318076e-195.921326e-1629.215236
3HDAC9 -0.85152742.6566573-10.2571352.358134e-242.157752e-2040.594449
4HIST1H1B -0.84678401.0095044-11.8974924.781304e-328.750025e-2858.171945
5MPO -0.74391021.9990494 -5.4377825.751320e-081.079508e-05 3.453652
6HIST1H1D -0.62715371.2377228 -9.3336001.733501e-207.049763e-1731.779463
7RABGAP1L -0.61403691.8144553 -9.3609591.346667e-206.161169e-1732.029248
8RUNX2 -0.59826531.9477654 -6.8552538.332255e-124.356698e-0912.077480
9BCL2 -0.59018561.9437992 -7.4733399.747972e-148.108762e-1116.444884
10UBE2E2 -0.58960081.7408516 -9.4849824.250084e-212.592622e-1733.170273
11WWOX -0.56876011.8820647 -7.6261143.071738e-142.810717e-1117.580742
12AUTS2 -0.56742741.8597347 -8.4278695.011197e-176.793142e-1423.905475
13HIST1H1C -0.55665080.8933185-10.0465641.922543e-231.407340e-1938.515301
14PRDX1 0.54660512.4794835 8.2427522.326274e-162.778513e-1322.391076
15SFMBT2 -0.53938581.7277444 -7.1828698.245688e-135.803854e-1014.346801
16PTTG1 0.53447821.1813705 9.1508399.200361e-202.806187e-1630.128793
17DIAPH3 -0.52532251.0074927 -9.1941856.209996e-202.066292e-1630.517470
18FCHSD2 -0.51658222.3827329 -7.5941833.917356e-143.497052e-1117.341499
19DIAPH2 -0.50583541.7573645 -8.6305248.998986e-181.646859e-1425.600407
20LINC01572-0.50372771.1909193 -8.4342244.751244e-176.688472e-1423.958039
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
133CD44 -0.287554181.66439588-5.5975382.335505e-085.058095e-06 4.32802737
151CD52 0.275805570.73682286 5.0156465.539139e-077.481108e-05 1.26319967
190CD63 0.257085661.89852791 4.7352912.271162e-062.490101e-04-0.09512028
195CD38 -0.255965971.04718520-4.8280101.436003e-061.684588e-04 0.34556992
206CD34 0.252738010.53207044 5.3303751.039926e-071.829919e-05 2.87979671
230CD99 0.239489731.17319929 5.0637334.316057e-076.122946e-05 1.50392221
336CDYL -0.212292021.07175479-4.2110742.603367e-051.745162e-03-2.42775880
512CD59 0.179196330.45740367 5.2222971.866608e-073.009680e-05 2.31373576
652CD81 0.162358471.13598289 3.5109074.520228e-041.603148e-02-5.12067109
714CDV3 0.154693541.35935331 3.3354628.601467e-042.281321e-02-5.71947848
897BICD1 -0.139417460.94689170-2.9686683.010523e-035.910957e-02-6.87278265
939C2CD5 -0.136691780.55794857-3.6622912.535397e-041.026074e-02-4.57948931
1148CD84 -0.121108200.39274125-3.4474015.724499e-041.708058e-02-5.34094334
1281CD4 -0.114761290.28359894-3.0387482.392579e-034.969965e-02-6.66274769
1623CD1D 0.099919740.35725796 2.3991031.648548e-021.966064e-01-8.39878191
1661CD58 0.098340870.46969995 2.7756065.538164e-039.264276e-02-7.42617491
2099CD79B 0.082267540.21083310 3.2835141.034948e-032.614225e-02-5.89092936
2198TBCD -0.078543320.45369966-2.3726391.771344e-022.060806e-01-8.46184204
2545CD99L2 -0.062611730.21312940-2.4871921.292024e-021.661608e-01-8.18386370
2918CD200R1-0.045573400.09994318-2.0202584.343008e-023.701012e-01-9.23500483

Characterizing Region 3 (R3) of the C+ Population¶

In this section we will be answering the question What makes the C+ Region 2 (R2) cells different from the remaining C+ cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [20]:
Cpos_R3_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Cpos", target_region = "R3")

# Print the top Genes differentially expressed in this region
head(Cpos_R3_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Cpos_R3_comparison$top_cd_genes, 20)
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1CST3 2.8442942.1637611 39.573535.421427e-2859.921483e-281639.3532
2S100A6 2.0824201.6373426 38.376401.051391e-2701.282732e-266606.4672
3S100A10 2.0459211.0808110 38.297549.047193e-2708.278407e-266604.3157
4LYZ 2.0073711.5582085 26.150183.107731e-1382.146152e-135301.6571
5SAMHD1 1.9011962.0192066 28.804811.773546e-1641.803154e-161362.0221
6S100A4 1.8895383.3488066 36.227021.572152e-2458.220334e-242548.5259
7HLA-DRA 1.7087103.2477191 30.215034.584386e-1796.453581e-176395.5820
8VIM 1.6910643.3620246 31.658512.013729e-1943.879185e-191430.9163
9HLA-DPA1 1.6742862.1864050 32.176354.959687e-2001.008497e-196443.8214
10HLA-DPB1 1.6047001.9964030 30.148672.276424e-1783.085904e-175393.9807
11SLC8A1 1.6037010.8390865 35.404504.499459e-2361.829830e-232526.7621
12MPO -1.5656971.9990494-17.19247 9.564979e-64 1.321086e-61130.4648
13COTL1 1.5543721.9108972 33.244149.408496e-2122.459717e-208470.7944
14HLA-DRB1 1.5073562.4250991 29.082632.564059e-1672.760209e-164368.5552
15S100A11 1.5025111.8005538 32.695601.050033e-2052.260721e-202456.8780
16ANXA2 1.4777051.0167411 33.832452.681142e-2188.177706e-215485.8561
17CD74 1.4673183.2601540 27.785023.352864e-1542.921861e-151338.3823
18PLXDC2 1.4667050.9928553 33.111502.759646e-2106.733719e-207467.4179
19RTN1 1.4199650.4684784 40.342512.916659e-2941.067526e-289660.6889
20KCNQ5 -1.4173202.3019023-22.745623.613062e-1071.520019e-104230.2260
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
17CD74 1.46731763.26015404 27.7850183.352864e-1542.921861e-151338.38232
39CD1C 1.17707540.47744823 29.4754482.316241e-1712.568992e-168377.85896
58CD36 0.98971580.56109847 27.0159991.306573e-1461.062708e-143320.92243
106CD48 0.81130731.29148444 23.1295941.627070e-1107.352146e-108237.91910
111CD86 0.79889780.41886951 29.4803082.063198e-1712.359847e-168377.97455
159CD1D 0.68916530.35725796 26.1432653.619463e-1382.453259e-135301.50488
278CD37 0.53732371.44754954 14.747038 7.626808e-48 6.583698e-46 93.98203
309CD34 -0.51766540.53207044-16.343751 5.449568e-58 6.312014e-56117.25487
323CD4 0.50989410.28359894 20.678192 6.938321e-90 1.853646e-87190.50256
449CD302 0.44107031.10428854 13.827663 2.035842e-42 1.484340e-40 81.54418
550CD63 0.39358111.89852791 10.644307 4.481249e-26 1.665159e-24 44.15140
602CD38 -0.37439761.04718520-10.359351 8.392601e-25 2.914398e-23 41.24643
631CD53 0.36480391.29404331 9.935960 5.695404e-23 1.815832e-21 37.06760
684HACD3 -0.35192150.92054313-11.397410 1.366389e-29 6.010961e-28 52.18403
909CD96 -0.29853890.31276717-11.416678 1.103400e-29 4.889289e-28 52.39626
986HACD4 0.28691400.64434172 10.632637 5.059863e-26 1.874454e-24 44.03097
1029CD1E 0.28040920.10755063 15.812364 1.633185e-54 1.712785e-52109.27794
1186CD200R1 0.25852600.09994318 17.283109 2.246267e-64 3.211547e-62131.90922
1197CD68 0.25739890.18316388 12.989085 9.695052e-38 6.014383e-36 70.82912
1531CD99 -0.22082881.17319929 -6.790842 1.297458e-11 1.922601e-10 11.27148

Characterizing Region 4 (R4) of the C+ Population¶

In this section we will be answering the question What makes the C+ Region 4 (R4) cells different from the remaining C+ cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [21]:
Cpos_R4_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Cpos", target_region = "R4")

# Print the top Genes differentially expressed in this region
head(Cpos_R4_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Cpos_R4_comparison$top_cd_genes, 20)

# print it as csv
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1MPO 2.84759841.9990494 19.302327 4.221984e-79 1.716987e-75166.21781
2PRTN3 2.30953390.4257153 26.6068831.250049e-1422.287651e-138312.25834
3AZU1 1.76622240.2779040 30.8122202.281937e-1858.352117e-181410.57526
4LRMDA 1.59774582.2763890 16.870219 1.567365e-61 3.824475e-58125.87388
5FNDC3B 1.39830281.2114140 19.801434 6.052030e-83 3.164433e-79175.04751
6SERPINB1 1.31566481.9682292 16.123110 1.555673e-56 3.163289e-53114.40711
7CST3 -1.27979782.1637611 -9.201877 5.790519e-20 7.623697e-18 30.70691
8HIST1H4C-1.25254142.3573200-10.322318 1.221591e-24 2.470246e-22 41.36658
9LYST 1.18318151.1446418 16.114164 1.780624e-56 3.430138e-53114.27253
10PRSS57 1.18040391.0461957 15.444283 3.656454e-52 5.818690e-49104.37935
11TUBB4B -1.17909771.5383101-14.695370 1.568426e-47 2.207921e-44 93.75628
12MKI67 -1.15849321.2989187-14.368750 1.420513e-45 1.856865e-42 89.26997
13TOP2A -1.12231991.3769076-12.301026 4.243544e-34 2.098891e-31 62.98637
14TUBA1B -1.09742483.9090800-12.982430 1.053470e-37 7.010557e-35 71.23871
15S100A4 -1.08804033.3488066-11.111343 3.140980e-28 8.530052e-26 49.56462
16HMGB2 -1.03645332.7763606-10.599882 7.110184e-26 1.626499e-23 44.18586
17SRGN 0.98860331.9498024 13.199777 6.853482e-39 5.119272e-36 73.95629
18HIST1H1B-0.96917421.0095044-12.080453 5.714418e-33 2.550652e-30 60.40262
19S100A6 -0.96554701.6373426 -9.281096 2.808909e-20 3.969455e-18 31.42235
20S100A10 -0.94964361.0808110 -9.277406 2.905548e-20 4.074558e-18 31.38889
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
63CD74 -0.71506193.2601540-7.5750584.529492e-143.475554e-1217.3189563
277CD52 -0.43881300.7368229-7.0998021.496502e-129.694419e-1113.8819585
311CD164 -0.42250141.4962513-7.1572949.913394e-136.609110e-1114.2862342
339CD1C -0.40945220.4774482-5.6543421.685336e-086.432221e-07 4.7652160
357CD36 -0.40368690.5610985-6.1782647.198422e-103.322440e-08 7.8360348
397CD53 -0.38057621.2940433-6.2861883.640248e-101.767065e-08 8.5016877
564CD44 0.33053641.6643959 5.7057801.250929e-084.844003e-07 5.0548880
672CD84 0.30253180.3927413 7.6848251.959387e-141.579637e-1218.1434574
678CD38 0.30096331.0471852 5.0346175.021197e-071.451665e-05 1.4781942
749CD1D -0.28908530.3572580-6.1816217.048548e-103.265619e-08 7.8565685
882CD96 0.26373920.3127672 6.0887441.256668e-095.568439e-08 7.2924556
1035CD81 -0.24305341.1359829-4.6661113.180509e-067.719483e-05-0.2981373
1157HACD3 -0.22836920.9205431-4.4546478.656844e-061.928479e-04-1.2572510
1160CD63 -0.22798141.8985279-3.7187892.032093e-043.019758e-03-4.2514083
1189CD86 -0.22503310.4188695-4.5723344.984328e-061.156826e-04-0.7288949
1254CD48 -0.21561951.2914844-3.5138964.469848e-045.934019e-03-4.9899092
1445CD302 0.19580481.1042885 3.6642052.516585e-043.608634e-03-4.4522048
1657CD300A-0.17784070.3300785-4.8864721.071039e-062.910252e-05 0.7480713
1892CD47 -0.16239911.1277816-3.1780941.494940e-031.674305e-02-6.1103392
2162CD34 -0.14711910.5320704-2.7431946.114397e-035.341123e-02-7.3951513

Characterizing Region 5 (R5) of the C+ Population¶

In this section we will be answering the question What makes the C+ Region 5 (R5) cells different from the remaining C+ cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [22]:
Cpos_R5_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Cpos", target_region = "R5")

# Print the top Genes differentially expressed in this region
head(Cpos_R5_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Cpos_R5_comparison$top_cd_genes, 20)
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1LTB 3.1066091.2740898 28.625711.176798e-1621.305211e-159358.3944
2TUBA1B -2.9210613.9090800-36.865986.033597e-2533.154796e-249566.1566
3TCF4 2.8162251.7692575 25.910935.949752e-1364.032720e-133296.9723
4JCHAIN 2.8046260.4055437 44.25181 0.000000e+00 0.000000e+00771.9434
5H2AFZ -2.6867763.7948555-34.876494.659122e-2301.894762e-226513.4821
6LINC01478 2.4357560.2620662 42.214824.191758e-3177.671126e-313713.8319
7HMGB2 -2.4147412.7763606-24.429643.783900e-1222.036684e-119265.2299
8CCDC50 2.3597681.1358201 27.214921.465646e-1481.277240e-145325.9713
9CUX2 2.3159460.4353410 39.750714.023346e-2873.681462e-283644.8176
10UGCG 2.2489400.7868915 27.399552.226463e-1501.987580e-147330.1539
11AC023590.1 2.2036520.4591193 31.180602.719170e-1894.739254e-186419.6750
12NIBAN3 2.1483350.4199959 33.665801.943175e-2165.080153e-213482.1385
13LINC01374 2.1032180.3242030 38.966101.014193e-2776.186746e-274623.1778
14RABGAP1L 2.0597221.8144553 28.290742.874816e-1592.843814e-156350.6009
15TPM2 2.0439720.2870637 39.148036.806123e-2804.982218e-276628.1799
16KCNQ5 -2.0316522.3019023-18.12351 2.430731e-70 5.264331e-68146.1699
17CARD11 1.9949820.6294967 30.345811.937617e-1802.836749e-177399.3065
18IRF8 1.9852581.2900046 20.16680 8.289671e-86 2.638350e-83181.6977
19FCHSD2 1.9739392.3827329 25.857061.933508e-1351.263720e-132295.7951
20LRMDA -1.9140752.2763890-18.99255 9.357970e-77 2.429156e-74160.9015
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
62CD74 1.46025673.2601540 14.7051081.369381e-471.369418e-45 93.963105
164CD2AP 1.03639491.0172753 16.9611463.749002e-626.323375e-60127.371493
228CD164 0.95165931.4962513 15.3781499.554399e-521.128066e-49103.494256
281CD37 0.89320821.4475495 13.9044187.371545e-436.131952e-41 83.118993
351CD63 -0.82936751.8985279-12.8505325.421698e-373.562649e-35 69.681283
359CDYL 0.82010881.0717548 13.7359996.804267e-425.497637e-40 80.907333
439CD53 0.76337461.2940433 11.8979584.755664e-322.497304e-30 58.369259
556HACD3-0.68452850.9205431-12.6651195.287620e-363.331018e-34 67.416855
731CD81 -0.59881061.1359829-10.8391065.796802e-272.326412e-25 46.743790
788CD38 -0.58141621.0471852 -9.1239211.173382e-193.193083e-18 30.080301
797CDT1 -0.57832270.6152767-12.7686881.487135e-369.599759e-35 68.678018
854CD34 -0.55459990.5320704 -9.7398283.801655e-221.191305e-20 35.751696
1091CD4 0.48534190.2835989 10.7594871.342812e-265.313326e-25 45.910539
1186SCD -0.46393070.6539370 -9.1582978.599748e-202.350705e-18 30.387537
1470CD36 0.40807840.5610985 5.8093416.812367e-098.050999e-08 5.717626
1497CD33 -0.40295670.5386984 -8.9398006.084250e-191.574891e-17 28.453509
1522CD48 -0.39635791.2914844 -6.0317391.784833e-092.239516e-08 7.022078
1733HACD1-0.36555800.3836370 -9.4963553.820853e-211.126890e-19 33.467614
1832CD59 -0.35234380.4574037 -8.5261512.189492e-175.166834e-16 24.914636
1835CD180 0.35197950.3461945 8.4438084.384175e-171.013678e-15 24.229380

Characterizing Region 1 (R1) of the Ra-C- Population¶

In this section we will be answering the question What makes the Ra-C- Region 1 (R1) cells different from the remaining Ra-C- cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [23]:
Raneg_Cneg_R1_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Raneg_Cneg", target_region = "R1")

# Print the top Genes differentially expressed in this region
head(Raneg_Cneg_R1_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Raneg_Cneg_R1_comparison$top_cd_genes, 20)

write.csv(Raneg_Cneg_R1_comparison$top_genes, "topGenes_Region1.csv")
write.csv(Raneg_Cneg_R1_comparison$top_cd_genes, "topCD_Region1.csv")
Warning message in asMethod(object):
"sparse->dense coercion: allocating vector of size 1.5 GiB"
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1HBD 2.6005580.9332628 60.29083 0.000000e+00 0.000000e+001381.5478
2STXBP5 2.3875271.9199339 69.60401 0.000000e+00 0.000000e+001722.4838
3ITGA2B 2.2814500.7618509 92.47893 0.000000e+00 0.000000e+002561.5289
4LTBP1 2.1652470.6594841 65.40463 0.000000e+00 0.000000e+001567.9860
5GP1BB 2.0997300.6598842 72.72827 0.000000e+00 0.000000e+001837.8255
6RAP1B 1.9792282.1936284 70.07873 0.000000e+00 0.000000e+001739.9956
7ABCC4 1.9252871.0928290 72.21383 0.000000e+00 0.000000e+001818.8220
8SPINK2 -1.9248721.7426931-69.27794 0.000000e+00 0.000000e+001710.4599
9PLCB1 -1.8755701.8908794-59.28000 0.000000e+00 0.000000e+001345.0226
10RNF220 -1.8288652.5707110-57.12300 0.000000e+00 0.000000e+001267.5479
11SLC24A3 1.7938770.6780904 57.21880 0.000000e+00 0.000000e+001270.9741
12MED12L 1.7474570.8803669 63.72476 0.000000e+00 0.000000e+001506.4814
13C1QTNF4 -1.7422531.5501189-60.94412 0.000000e+00 0.000000e+001405.2200
14PLXDC2 1.7058430.9520478 58.30154 0.000000e+00 0.000000e+001309.7954
15SH3BGRL3 1.6998352.6590816 58.64659 0.000000e+00 0.000000e+001322.2032
16ATP8B4 -1.6791061.6559922-52.94559 0.000000e+00 0.000000e+001119.7040
17NKAIN2 -1.6648882.7157017-37.301911.739818e-2713.557491e-269 608.1647
18RAB27B 1.6291710.9554767 58.10343 0.000000e+00 0.000000e+001302.6790
19UBE2C 1.5953140.9767416 53.48448 0.000000e+00 0.000000e+001138.5884
20LAT 1.5668110.4916719 72.43056 0.000000e+00 0.000000e+001826.8277
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
33CD74 -1.38862352.2728535-44.70930 0.000000e+00 0.000000e+00 840.1121
95CD99 -1.01494871.4580142-43.25054 0.000000e+00 0.000000e+00 792.7357
99CD84 0.99845810.4955969 53.73067 0.000000e+00 0.000000e+001147.2357
107CD48 -0.96846660.8941089-44.67238 0.000000e+00 0.000000e+00 838.9039
126CD63 0.91357882.2570242 37.466601.289011e-2732.759011e-271 613.0672
142CD36 0.87446680.3246596 36.777249.774527e-2651.873076e-262 592.6316
143CD34 -0.86773031.3150180-35.248662.094257e-2453.376736e-243 548.1467
177CD44 -0.81268851.2309836-32.721681.334583e-2141.732166e-212 477.2655
195CD55 0.77166360.5686104 39.335493.607161e-2989.105220e-296 669.5748
275CD52 -0.66019831.5512427-23.337586.625781e-1153.587429e-113 247.9604
318CD53 -0.62245230.8918330-28.844012.102008e-1701.904346e-168 375.5861
529CD164-0.49211401.8707070-20.19441 1.613185e-87 6.567762e-86 185.0219
552CD200-0.47938800.4247843-28.941351.828026e-1711.681095e-169 378.0259
659BICD1-0.43510511.0190811-19.46995 1.103106e-81 4.153786e-80 171.6179
839CD69 0.38025780.6807524 16.06343 9.060894e-57 2.367150e-55 114.4202
847CD37 -0.37842511.4375666-16.07689 7.365320e-57 1.926934e-55 114.6266
970CD9 0.34536460.1117994 24.971472.049375e-1301.293262e-128 283.6182
992CD226 0.33908670.1304901 27.698404.042864e-1583.302966e-156 347.3312
1017CD82 0.33354240.5675827 17.01665 2.590232e-63 7.512289e-62 129.4362
1041CD302-0.32629790.5838855-18.37965 2.984599e-73 1.005887e-71 152.2521

Characterizing Region 2 (R2) of the Ra-C- Population¶

In this section we will be answering the question What makes the Ra-C- Region 2 (R2) cells different from the remaining Ra-C- cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [24]:
Raneg_Cneg_R2_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Raneg_Cneg", target_region = "R2")

# Print the top Genes differentially expressed in this region
head(Raneg_Cneg_R2_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Raneg_Cneg_R2_comparison$top_cd_genes, 20)
Warning message in asMethod(object):
"sparse->dense coercion: allocating vector of size 1.5 GiB"
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1NKAIN2 1.61735662.7157017 35.884552.191341e-2532.673509e-249566.5133
2HIST1H4C-1.31128812.2538890-28.056656.383785e-1623.894215e-159356.0775
3HMGB2 -1.28639522.6003865-38.385091.345418e-2852.462181e-281640.6437
4TOP2A -1.17485501.2552635-34.553738.701558e-2373.981071e-233528.3167
5UBE2C -1.15634850.9767416-34.601962.213043e-2371.157137e-233529.6850
6TUBB4B -1.12931531.5703254-35.093991.772643e-2431.297610e-239543.7134
7HBD -1.10491580.9332628-20.55624 1.679011e-90 2.301628e-88191.8769
8HLA-DRA 1.04082712.3423770 26.371182.330121e-1449.072847e-142315.6858
9MSI2 1.03008162.5742865 30.686566.450141e-1911.026442e-187422.7771
10RNF220 1.02779362.5707110 26.978891.350358e-1506.101786e-148330.0294
11UBE2S -1.02163061.3800974-33.738128.204519e-2272.144954e-223505.3652
12SPINK2 1.02122301.7426931 28.701887.356749e-1695.609675e-166372.0369
13HIST1H1B-1.01882951.0765159-29.627855.103361e-1795.336803e-176395.4053
14HOPX 1.00469591.1811817 35.590541.100540e-2491.007022e-245557.9968
15MKI67 -0.99518431.0553881-34.983544.186815e-2422.554027e-238540.5534
16CENPF -0.97125441.3730378-29.458353.845265e-1773.703698e-174391.0874
17CD74 0.95274692.2728535 28.006362.192806e-1611.294498e-158354.8448
18PLCB1 0.94931431.8908794 24.614175.956410e-1271.639177e-124275.6575
19RRM2 -0.94713060.8967019-38.448191.993591e-2867.296741e-282642.5521
20NRIP1 0.92644142.4574898 29.477712.349684e-1772.324345e-174391.5795
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
17CD74 0.95274692.2728535 28.0063552.192806e-1611.294498e-158354.84484
45CD52 0.77538461.5512427 27.8750075.459927e-1603.122481e-157351.63352
64CD99 0.67378351.4580142 26.2386195.168612e-1431.950272e-140312.59046
100CD48 0.59691660.8941089 24.8004679.422292e-1292.715475e-126279.79801
136CD34 0.54900611.3150180 20.871016 3.923522e-93 5.544588e-91197.92294
138CD37 0.54430911.4375666 23.6766094.705785e-1181.118419e-115255.20109
195CD44 0.46870101.2309836 17.705505 2.986698e-68 2.739753e-66140.77385
315CD84 -0.40148530.4955969-17.942543 5.442702e-70 5.201262e-68144.76716
316CD53 0.40057130.8918330 17.744817 1.542205e-68 1.432646e-66141.43284
389CD63 -0.37678932.2570242-13.996693 9.252102e-44 4.594792e-42 84.59377
394CD36 -0.37540740.3246596-14.368022 5.617585e-46 2.988506e-44 89.67381
520CD200 0.33267010.4247843 19.285397 3.159699e-80 3.648207e-78168.27364
523CD109 0.33220490.8986139 14.935910 1.805819e-49 1.059211e-47 97.68097
570CD55 -0.32040020.5686104-14.666228 8.527844e-48 4.824229e-46 93.84272
727CD164 0.28293551.8707070 11.305824 2.595364e-29 7.704212e-28 51.52489
842SCD -0.26330340.6314002-13.023544 3.319280e-38 1.372757e-36 71.87000
1027CD79B 0.23418510.3376392 15.483915 5.876647e-53 3.780161e-51105.67854
1441C2CD2 0.18826070.4018778 11.341493 1.743786e-29 5.240091e-28 51.91960
1479HACD3-0.18570941.1536948 -8.458455 3.440544e-17 5.721298e-16 23.88617
1711CD38 -0.16941500.7399566 -7.333030 2.578988e-13 3.285539e-12 15.09935

Characterizing Region 1 (R3) of the Ra-C- Population¶

In this section we will be answering the question What makes the Ra-C- Region 3 (R3) cells different from the remaining Ra-C- cells? . We shall do so by completing DGE, and then supplementing it with a manual scrape of the top genes. Note that this is a repetition of the previous workflow:

In [25]:
Raneg_Cneg_R3_comparison <-
    perform_dge(sce_object = merge2_clean, target_pop = "Raneg_Cneg", target_region = "R3")

# Print the top Genes differentially expressed in this region
head(Raneg_Cneg_R3_comparison$top_genes, 20)

# Print the top Cluster Differentiating Genes differentially expressed in this region
head(Raneg_Cneg_R3_comparison$top_cd_genes, 20)
Warning message in asMethod(object):
"sparse->dense coercion: allocating vector of size 1.5 GiB"
Warning message:
"Zero sample variances detected, have been offset away from zero"
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
1FHIT -1.42554071.9786639-19.6983551.670403e-833.056922e-79176.56335
2WWOX -1.37882302.2088429-18.2528762.677836e-721.960230e-68150.82943
3NKAIN2 -1.32153912.7157017-12.4676333.368158e-354.109265e-32 65.75139
4RAD51B -1.31241291.8692422-21.1877358.163596e-962.987958e-91204.84806
5IMMP2L -1.26003721.9230808-19.0375072.739828e-782.507012e-74164.58520
6LRMDA -1.19541522.1615629-17.5489874.097231e-672.499379e-63138.92597
7SMYD3 -1.13053552.3841554-19.4962636.822155e-828.323256e-78172.86257
8MALAT1 -1.09916205.8601224-17.1635122.372408e-649.648056e-61132.58439
9STXBP5 -1.09340121.9199339-10.9084772.011331e-271.840419e-24 47.97152
10HMGB2 1.08857562.6003865 13.6633508.139813e-421.354206e-38 80.90205
11PTTG1 1.06865041.1477022 17.4269223.113348e-661.627881e-62136.90420
12ZBTB20 -1.03806241.4574735-15.6864792.825661e-547.955540e-51109.46464
13LRBA -1.02101202.6127848-17.3028232.415707e-651.105216e-61134.86168
14MIR924HG -0.96735631.6771880-12.4226345.826470e-356.879181e-32 65.20672
15INPP4B -0.94773021.7127337-13.2943401.029226e-391.506828e-36 76.08759
16SNHG3 0.93295281.3746169 16.6432031.040811e-603.463157e-57124.22549
17EIF4G3 -0.88304651.8522299-15.1164811.318710e-503.217741e-47101.05004
18HIST1H4C -0.86871742.2538890 -8.1486764.522432e-161.190831e-13 22.10884
19HBD -0.85675090.9332628 -7.1915807.267651e-131.047257e-10 14.84506
20AC008014.1-0.85186811.1810701-12.8070455.085818e-376.648073e-34 69.91945
A data.frame: 20 × 7
IDlogFCAveExprtP.Valueadj.P.ValB
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
126CDV3 0.46137971.2229999 9.9384674.442713e-232.665701e-2038.0565977
247CD84 -0.35332600.4955969-7.1860417.565594e-131.085915e-1014.8056108
348CD164 0.30680211.8707070 5.6652441.543257e-089.336323e-07 5.1061422
405CD36 -0.28969390.3246596-5.0850773.797517e-071.676634e-05 2.0035589
421CD82 -0.28242160.5675827-6.5564426.018063e-115.985519e-0910.5158853
508CD63 -0.25982292.2570242-4.4281769.688833e-062.674366e-04-1.1086749
513CD74 0.25911152.2728535 3.3231348.959829e-041.289562e-02-5.3802127
571CD48 0.24805570.8941089 4.5628415.157164e-061.517342e-04-0.5054093
577HACD3 0.24774021.1536948 5.2385821.678891e-078.128185e-06 2.7922191
816CD37 -0.21004341.4375666-4.0617034.939885e-051.101796e-03-2.6594795
940CD44 0.19741351.2309836 3.3838977.196813e-041.079551e-02-5.1768245
956CDT1 -0.19617570.7378909-4.6420213.530970e-061.131519e-04-0.1423327
1012CD34 0.18935941.3150180 3.2321761.235812e-031.669593e-02-5.6778086
1022CD38 -0.18868320.7399566-3.7932071.503217e-042.855478e-03-3.7111988
1040CD99 0.18687611.4580142 3.1995271.384411e-031.840568e-02-5.7826245
1207CD55 -0.17235660.5686104-3.6114203.072609e-045.290830e-03-4.3826659
1280CD58 -0.16687480.5847240-4.3244581.556313e-054.052797e-04-1.5610683
1339BICD1-0.16138991.0190811-3.2512861.155817e-031.577332e-02-5.6159664
1683CD2AP 0.13957650.9274637 3.1253451.785289e-032.247794e-02-6.0168337
1713CD47 -0.13840161.2640750-2.9260763.446824e-033.835731e-02-6.6188598

Since it would get very unwieldy if I was to characterize and interpret these populations here, I have done so in an external slide deck, which can be found using this link:

https://docs.google.com/presentation/d/1Jd8VXRSZ-4r_1EfP6tuxpk56tDw5Yt-8lMsy-ucJkSI/edit?usp=sharing

However, I did relabel the populations according to their subsets. Here's the final result: